Search CORE

15 research outputs found

Interoperability and FAIRness through a novel combination of Web technologies

Author: Bolleman Jerven T.
Bonino da Silva Santos Luiz Olavo
Ciccarese Paolo
Clark Tim
Dumontier Michel
Gavai Anand
Gray Alasdair J. G.
Kaliyaperumal Rajaram
Kelpin Fleur D. L.
Kuzniar Arnold
Schultes Erik A.
Swertz Morris A.
Thompson Mark
van Mulligen Erik M.
Verborgh Ruben
Wilkinson Mark D.
Publication venue: 'PeerJ'
Publication date: 01/01/2017
Field of study

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

Maastricht University Research Portal

Heriot Watt Pure

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Ghent University Academic Bibliography

Directory of Open Access Journals

Dissertations of the University of Groningen

The health care and life sciences community profile for dataset descriptions

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets

Carleton University's Institutional Repository

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

Author: Bolleman Jerven T,
Publication venue
Publication date: 18/05/2017
Field of study

Ezid

Property Graph vs RDF triple store : a comparison on glycan substructure search

Author: Alocci Davide
Bolleman Jerven T
Campbell Matthew P
Horlacher Oliver
Lisacek Frederique
Mariethoz Julien
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph.We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.17 page(s

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

Archive ouverte UNIGE

Ontology overview.

Author: Davide Alocci (836805)
Frederique Lisacek (1462591)
Jerven T. Bolleman (836808)
Julien Mariethoz (836806)
Matthew P. Campbell (836809)
Oliver Horlacher (836807)
Publication venue
Publication date
Field of study

<p>Overview of the ontology developed for translating glycan structures into RDF/semantic triples. The figure shows all the predicates and the entities used for defining a glycan structures into the RDF triple store.</p

FigShare

Average query time.

Author: Davide Alocci (836805)
Frederique Lisacek (1462591)
Jerven T. Bolleman (836808)
Julien Mariethoz (836806)
Matthew P. Campbell (836809)
Oliver Horlacher (836807)
Publication venue
Publication date
Field of study

<p>The mean value calculated on the response times of each query in both sets is shown in two bar charts. Panel (A) shows the mean query times for the first set and panel (B) contains the values for the second set. The column assign to Virtuoso in the second set of query is empty because we could not record any data due to a problem in running large and very large queries.</p

FigShare

Query building example.

Author: Davide Alocci (836805)
Frederique Lisacek (1462591)
Jerven T. Bolleman (836808)
Julien Mariethoz (836806)
Matthew P. Campbell (836809)
Oliver Horlacher (836807)
Publication venue
Publication date
Field of study

<p>A. Example of use of the RDF model to build a SPARQL query from a glycan substructure focussing on the translation process. The prefix part of the query is omitted but further detailed examples are provided in the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0144578#pone.0144578.s001" target="_blank">S1 File</a>. B. The same example is shown with building a Cypher query, the native language in Neo4J. Similarly, additional examples are provided in the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0144578#pone.0144578.s001" target="_blank">S1 File</a>.</p

FigShare

Glycan CFG encoding and graph encoding.

Author: Davide Alocci (836805)
Frederique Lisacek (1462591)
Jerven T. Bolleman (836808)
Julien Mariethoz (836806)
Matthew P. Campbell (836809)
Oliver Horlacher (836807)
Publication venue
Publication date
Field of study

<p>On the left hand side a glycan structure encoded with CFG nomenclature is presented, while the right hand side shows the same structure translated into a graph. Each monosaccharide or substituent becomes a node and each glycosidic bond becomes an edge in the graph. Avoiding any loss of information all the properties of each monosaccharide or substituent are converted in node properties whereas glycosidic bond properties are translated in edge properties. To be more clear the colour code associate with the monosaccharide type is preserved among the images.</p

FigShare

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

Author: Baran Joachim
Bolleman Jerven T.
Bonnal Raoul J. P.
Buels Robert
Cock Peter J. A.
Dumontier Michel
Fujisawa Takatomo
Hoehndorf Robert
Katayama Toshiaki
Mungall Christopher J.
Strozzi Francesco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. DESCRIPTION: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. CONCLUSIONS: Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores

Maastricht University Research Portal

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Dataset Descriptions: HCLS Community Profile

Author: Alexiev Vladimir
Ansell Peter
Bader Gary
Bando Asuka
Bolleman Jerven T.
Callahan Alison
Cruz-Toledo José
Gaudet Pascale
Gombocz Erich A.
Gonzalez-Beltran Alejandra
Groth Paul
Haendel Melissa
Ito Maori
Jupp Simon
Juty Nick
Katayama Toshiaki
Kobayashi Norio
Krishnaswami Kalpana
Laibe Camille
Le Novère Nicolas
Lin Simon
Malone James
Miller Michael
Mungall Christopher J.
Rietveld Laurens
Wimalaratne Sarala M.
Yamaguchi Atsuko
Publication venue: World Wide Web Consortium
Publication date: 14/05/2015
Field of study

Heriot Watt Pure